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Abstract 

Since the pre-historic era, humans have been using forests as a food, drugs and handcraft reservoir. Today, 
the use of botanical raw material to produce pharmaceuticals, herbal remedies, teas, spirits, cosmetics, 
sweets, dietary supplements, special industrial compounds and crude materials constitute an important 
global resource in terms of healthcare and economy. In recent years, DNA barcoding has been suggested as 
a useful molecular technique to complement traditional taxonomic expertise for fast species identification 
and biodiversity inventories. In this study, in situ application of DNA barcodes was tested on a selected 
group of forest tree species with the aim of contributing to the identification, conservation and trade 
control of these valuable plant resources. 

The "core barcode" for land plants {rbch, matK, and trnH-psbA) was tested on 68 tree specimens 
(24 taxa). Universality of the method, ease of data retrieval and correct species assignment using sequence 
character states, presence of DNA barcoding gaps and GenBank discrimination assessment were evaluat- 
ed. The markers showed different prospects of reliable applicability. RbcL and trnH-psbA displayed 1 00% 
amplification and sequencing success, while matK did not amplify in some plant groups. The majority of 
species had a single haplotype. The trnH-psbA region showed the highest genetic variability, but in most 
cases the high intraspecific sequence divergence revealed the absence of a clear DNA barcoding gap. We 
also faced an important limitation because the taxonomic coverage of the public reference database is 
incomplete. Overall, species identification success was 66.7%. 

This work illustrates current limitations in the applicability of DNA barcoding to taxonomic forest 
surveys. These difficulties urge for an improvement of technical protocols and an increase of the number 
of sequences and taxa in public databases. 
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Introduction 

Forests figure prominently among the world's most important ecosystems. The impor- 
tance of trees in sustaining biodiversity and habitat stability, as well as to provide a large 
variety of environmental services is well acknowledged. Nevertheless, the increasing hu- 
man impact, the recent environmental decay, and the on-going climate change are among 
the main factors affecting forest communities, especially at local and regional scales 
within the Mediterranean basin (FOREST EUROPE, UNECE and FAO 20 11). In the 
meantime, international market pressures call for higher quality standards. One way to 
convince decision-makers of the importance of conserving wild plants and habitats is to 
demonstrate their economic potential (Kathe 2006). The socio-economic contribution of 
forests to livelihood and the impact of their use on the environment are essential compo- 
nents of modern concepts for sustainable forest management (Arnold and Perez 2001). 

Temperate and boreal forests are a traditional source, not only for timber, but 
also for many products that have been extracted from forests for millennia, includ- 
ing resin, tannin, fodder, litter, medical plants, fruits, nuts, roots, mushrooms, seeds, 
honey, ornamentals and exudates. Today there is an institutional rediscovery of the 
value of forest products and services other than timber, and the total value of Non- 
Wood Goods (NWGs) reported in Europe has almost tripled since 2007 (FOREST 
EUROPE, UNECE and FAO 201 1). 

Besides wood trade, Mediterranean woody flora includes numerous valuable spe- 
cies used as ornamentals or for secondary products processing and marketing (edibles, 
industrial and medicinal compounds). The option of stimulating the production of 
non-timber forest products has long been considered promising (Arnold and Perez 
2001, Wunder 2001), and it is well illustrated in the case of Medicinal and Aromatic 
Plants (MAPs). In many Euro-Mediterranean countries MAPs resources are still un- 
known or overlooked (Lange 2006). In other countries, the necessary plant materials 
(roots, bark, leaves, fruits and seeds) are generally collected and sold by local people to 
traders and to the industry. Final products are then purchased by international export- 
ers (WHO 2003). Forest overexploitation, product forgery and misidentifications are 
common risks, with the latter two usually occurring as a result of morphologically indis- 
tinguishable materials, species with similar common names, or intentional substitution 
of economically valuable materials by inexpensive specimens. At the same time, plant 
misidentification and forgery are serious threats to human health (Vanherweghem et 
al. 1993, Barthelson et al. 2006, Sundus 2008). The identification of herbal medicinal 
materials using traditional, organoleptic and chemical methods can be difficult, par- 
ticularly for processed materials of a plant (Govindaraghavan et al. 2012). Also plant 
germplasm (seeds and seedlings) purchased for the establishment of MAPs orchards, 
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afforestation programs, and ornamentals, may be difficult to recognize. Therefore, an 
accurate, universal, stable and specific method allowing non-specialists to identify the 
source species from a tiny amount of tissue is needed. 

Molecular technology is considered a reliable alternative tool for the identification 
of plant species (e.g. Savolainen et al. 2000) and DNA barcoding is the latest move to- 
wards the generation of universal standards (Kane and Cronk 2008). A DNA barcode 
is a universally accepted short DNA sequence allowing the prompt and unambiguous 
identification of species (Savolainen et al. 2005), promoted for a variety of biological 
applications (Hollingsworth et al. 2011), including biodiversity inventories (Costion 
et al. 2011, de Vere et al. 2012), the identification of medicinal plants (Heubl et al. 
2010), of natural health products (Wallace et al. 2012), and of tree species listed in 
the Convention on International Trade of Endangered Species (Muellner et al. 201 1). 

Based on the relative ease of amplification, sequencing, multi-alignment and the 
amount of variation displayed (sufficient to discriminate among sister species without 
affecting their correct assignation through intraspecific variation), three plastid loci are 
currently used in plants: rbcL (a universal but slowly evolving coding region), matK. 
(a relatively fast evolving coding region) and trnH-psbA (a rapidly evolving intergenic 
spacer) (CBOL Plant Working Group 2009). More recently, the nuclear ribosomal 
internal transcribed spacer (ITS) has also been suggested as an efficient barcoding locus 
for complex plant groups (Hollingsworth et al. 201 1). 

Tree taxa have peculiar biological, evolutionary and taxonomic features that are 
likely to constitute a challenge to species recognition through DNA barcodes, viz. the 
generally low mutation rate of the plastid DNA, their ability to hybridize, and their 
narrowly defined species limits (Petit and Hampe 2006). Nevertheless, DNA barcod- 
ing has proven its utility in several detailed studies of tree genera (Newmaster et al. 
2008, Newmaster and Ragupathy 2009, Kress et al. 2009, 2010, Ren et al. 2010, Roy 
et al. 2010, Liu et al. 201 1). In this study, in situ application of DNA barcoding was 
applied to a number of indigenous and introduced tree species in the Mediterranean 
area, with medicinal, ornamental, edible, industrial and conservation relevance. Taxa 
were analysed with the core barcode for land plants {rbcL, matK, and trnH-psbA); ease 
and success to achieve correct species identification were evaluated based on the rela- 
tive efficiency of each marker, data quality and representation in the GenBank/EMBL 
database. Our final objective is to provide a contribution to the future assemblage of a 
regional data/species inventory in the Mediterranean area for adequate identification, 
conservation and trade control of these valuable resources. 



Materials and methods 

Plant material and molecular analyses 

Sixty eight trees belonging to 24 species (ten genera, nine families) were sampled in the 
wild (Italy, Greece and adjacent areas) and/or Botanic Gardens (Table 1). Plants were 
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Table I . Sample list. 



Familia 


s l 


secies 


Relevance 


INo. or samples 


Pinaceae 


Cedrus 


atlantica 


Ornamental/ afforestation 


D 


deodara 


Ornamental/afforestation 


•3 
J 


libani 


Ornamental/ afforestation/ conservation 


a 
D 


Rosaceae 


Crataegus 


monogyna 


Medicinal/ ornamental 


3 


oxyacantha 


Medicinal/ ornamental 


Z 


azarolus 


Food industry/conservation 


A 

4 


Sorbus 


aria 


/ 


■x 

D 


aucuparia 


Ornamental/conservation 


2 


domestica 


Medicinal/ rood industry 


3 


torminalis 


Valuable wood industry 


3 


Sapindaceae 


Aesculus 


hippocastanus 


Medicinal/ornamental 


a 
D 


indica 


/ 


•3 

D 


Uleaceae 


rraxmus 


ornus 


Medicinal/food industry 


c 

J 


angustijoLia 


/ 


3 


excelsior 


/ 


2 


Adoxaceae 


Sambucus 


nigra 


Medicinal 


c 

J 


ebulus 


/ 


z 


racemosa 




1 
1 


Passifloraceae 


Passiflora 


incarnata 


Medicinal/ornamental 


2 


edulis 


Food industry 


1 


Lythraceae 


Punic a 


granatum 


Medicinal/ food industry/ ornamental 


4 


Rhamnaceae 


Ziziphus 


jujuba 


Medicinal/food industry 


3 


Aquifoliaceae 


Ilex 


aquifolium 


Medicinal/ornamental/conservation 


4 


latifolia 


/ 


1 



identified direcdy in the field. Herbarium specimens and lyophilized green tissues of 
the collected material were vouchered and preserved at the Mediterranean Forest DNA 
bank of the University of Tuscia (www.Medna-bank.eu). 

DNA extractions were performed with the DNeasy Plant Minikit (QIAGEN), 
following the manufacturer's instructions. The universal applicability of the technical 
analyses was considered a prerequisite for exploring the DNA barcoding potential in a 
practical floristic case study: uniform PCR procedures were thus performed for all taxa 
and barcoding loci. Genomic DNAs (ca. 40 ng) were amplified with RTG PCR beads 
(GE Healthcare) in 25 \A final volume according to the manufacturer's protocol. Ther- 
mocycling conditions were as follows: 94 °C for 3 min, followed by 35 cycles of 94 °C 
for 30 s, 53 °C for 40 s and 72 °C for 40 s, with a final extension step of 10 min at 72 
°C. Primers for the investigated barcoding region are shown in Table 2. MatKlF/2R 
oligos were used in Cedrus (Wang et al. 1 999). PCR products were cleaned with Illustra 
DNA/Gel Band Purification Kit (GE Healthcare). Standard aliquots were submitted to 
Macrogen Inc. (http://www.macrogen.com) for sequencing. Electropherograms were 
edited with CHROMAS 2.3 (http://www.technelysium.com.au) and checked visually. 
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Table 2. Primers lisr. 



Marker region 


Primers 


Reference 


rbcL 


Fw - 


Al O i ^AL^A^AAALvALiAAAC 


Kress et al. (2005) 


Rev 


- TCGCATGTACCTGCAGTAGC 


trnH-psbA 


Fw - 


CGCGCATGGTGGATTCACAATCC 


Shaw et al. (2007) 


Rev 


- GTTATGCATGAACGTAATGCTC 


m«<K Kim 


Fw - 


CGTACAGTACTTTTGTGTTTACGAG 


Kim (unpublished) 


Rev 


- ACCCAGTCCATCTAAATCTTGGTTC 


ma/KlF/2R 


Fw - 


GAACTCGTCGGATGGAGTG 


Wangetal. (1999) 


Rev 


- TAAACGATCCTCTCATTCACGA 



Bioinformatics tools 

Sequences were aligned with MEGA5 (Tamura et al. 201 1) and checked by eye. Hap- 
lotypes were defined with BLASTClust v2.2.20 (http://toolkit.tuebingen.mpg.de/ 
blastclust) with the following command line: blastclust -i infile -o outfile -p F -LI 
-bT -SI 00, thus requiring to cluster together only sequences with 100% identity and 
length coverage. All the species presenting single haplotypes were considered efficiently 
discriminated; those displaying at least one haplotype in common with another species 
were considered precluded to discrimination. 

Species discrimination power of the investigated loci was also assessed using the 
genetic distance approach, to evaluate whether the amount of variation displayed 
was sufficient to discriminate sister species without affecting their correct assigna- 
tion through intraspecific variation. This approach is at the basis of the "barcoding 
gap" definition, i.e. the assumption that the amount of sequence divergence within 
species is smaller than that between species. Uncorrected p-distance matrices of 
sequence divergences within and among congeneric species were calculated for each 
gene fragment and for the two joined markers (rbcL + trnH-psbA), with MEGA5. 
All the species presenting a minimum interspecific distance value higher than their 
maximum intraspecific distance were considered successfully discriminated (Meyer 
et al. 2008). 

Finally, we simulated a barcode identification scenario using each sequence as an 
unknown query and GenBank (http://www.ncbi.nlm.nih.gov) as global reference da- 
tabase. The NCBI Taxonomy database (http://www.ncbi.nlm.nih.gov/taxonomy) was 
screened to assess the presence of the investigated species set in GenBank, relatively to 
markers under study. The identification ability of every single marker was evaluated 
using the megaBLAST algorithm (http://blast.ncbi.nlm.nih.gov) with default param- 
eters and adjusted to retrieve 5000 sequences. A query sequence was considered as suc- 
cessfully identified if the top Bit-score obtained in GenBank matched the name of the 
species (Ross et al. 2008). Identification success was only inferred for species/sequences 
represented in GenBank. When more than one species shared a top Bit-Score or the 
species scored lower, the result was considered an identification failure. 
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Results 

Markers' main features 

Optimal amplification rates were obtained with rbcL and trnH-psbA which produced 
clear, single-banded PCR products from all 68 investigated samples (136 sequences; 
100% efficiency). MatK was not consistently amplified in the Pinaceae and Rosaceae 
(44. 1% of the investigated dataset) and thus it was not included in further analyses. All 
rbcL electropherograms were easily read and analysed. Conversely, the very long poly- 
nucleotide repeats in the trnH-psbA regions of Sambucus sp. made subsequent traces 
hardly readable. Consequently, in this genus the entire sequences were completed by 
joining partial bidirectional reads (Kress and Erickson 2007). The alignment of rbcL 
sequences was straightforward with a consensus of 688 bp (no indels found). The 
trnH-psbA sequences varied greatly in length, ranging from 396 {Sorbus and Crataegus 
spp.) to 622 bp {Cedrus spp.). Numerous gaps were observed in this region. An indel 
of 45 bp turned out to be diagnostic to discriminate the two Aesculus species, an indel 
of 55 bp discriminated Fraxinus ornus from F. excelsior and F. angustifolia, one of 66 bp 
discriminated Sambucus ebulus from S. racemosa and other indels (20-22 bp) were diag- 
nostic for Sorbus torminalis and Cedrus deodara. Shorter gaps (1-19 bp) were detected 
intraspecifically in all species except in Punica, Ziziphus and Ilex. All sequences have 
been deposited in GenBank under accession numbers HG765031-HG765098 (rbcL), 
and HG764963-HG765030 (trnH-psbA). 



Markers' discrimination ability 

The alignment— free method implemented in BLUSTClust produced for each marker 
the haplotypes shown in Table 3. Based on the uniqueness of sequence character states, 
trnH-psbA generated a total of 43 haplotypes, 35 of which could be ascribed to sin- 
gle species. Common haplotypes were displayed by 14 individuals of the following 
species pairs, thus preventing their discrimination: Fraxinus angustifolia - F. excelsior 
(three samples), Crataegus monogyna - C. oxyacantha (four samples), Sorbus aucuparia 
- S. domestica (two samples), Ilex aquifolium - 1, latifolia (five samples). Consequently, 
trnH-psbA discrimination ability was 79.4% of the investigated plants, corresponding 
to 66.7% of the species in the total dataset, 63.6% considering only those genera in 
which at least one species pair was sampled. 

RbcL displayed a much lower sequence differentiation (with a total of 3 1 haplo- 
types, 12 of which were shared between species). No haplotypes were shared among 
species from different genera. The two-marker combination did not improve markedly 
the discrimination efficacy displayed by trnH-psbA alone. 

In this study, the two potential DNA barcodes displayed different levels of intra- 
and interspecific distances. With rbcL, all intraspecific uncorrected p-distances were 
zero, except in Cedrus atlantica (0.0014), Sorbus aria (0.0014), S. aucuparia (0.0028), 
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Table 3. Haplotypes generated by BLASTClust in the investigated dataset with both markers and their 
combination. Shaded: species where unique haplotypes (either single or in combination) were detected. 



Species 


Samples 


Unique haplotypes 


Inter - 


species shared haplotypes 


rbcL 


trno-psDA 


v^ombinea 


rbcL, 


trnrl-psDA 


i^ombined 




atlantica 


2 
J 


L 


1 


1 






/ 


Cedrus 


deodara 


2 
D 


— '— 


1 


1 





— - — 


i 
1 




libani 


2 
D 


— J— 


1 


1 


— - — 


— 


i 
t 




monogyna 


2 
D 






/ 


— - — 


— 


i 
1 


Crataegus 


oxyacantha 


L 


— — 


i 
1 


1 


— - — 


— 

1 — 


i 
1 




azarolus 


4 




— ^~ 


1 


L 


— - — 
— 1 — 




i 
1 




aria 


2 
D 


— '— 


2 
D 


2 




— 


i 
1 


So r bus 


aucuparia 


L 


— !— 


1 


1 


— - — 
— l - — 


— 

' — 


1 


domestica 


2 
D 


— ^~ 


1 


1 




1 — 


1 




torminalis 


2 
D 


— '— 


1 
1 


1 
1 


— - — 






Aesculus 


hip p o cast an us 


2 
D 


— '— 


L 


L 


— - — 





i 
1 


indica 


2 


— '— 


2 


2 





— 


i 
1 




ornus 


C 

J 




4 


e 
? 


— — 


— 


i 
t 


Fraxinus 


angustifolia 


2 
D 


— ^— 


1 


1 


— — 


— 

1 — 


1 




excelsior 


L 






/ 


— — 
— 1 — 




i 
1 




nigra 


c 





4 


4 


— 1 — 


— 


1 


Sambucus 


ebulus 


L 


— — 


L 


A, 


— 1 — 


— 


1 




racemosa 


i 
1 




l 


l 






i 


Passiflora 


incarnata 


2 




2 


2 






i 


edulis 


1 




1 


1 






i 


Punic a 


granatum 


4 




1 


1 


n.d. 


n.d. 


n.d. 


Ziziphus 


jujuba 


3 




1 


1 


n.d. 


n.d. 


n.d. 


Ilex 


aquifolium 


4 




/ 


/ 


1 




1 


latifolia 


1 




/ 


/ 


1 




1 


Total 


68 


19 


35 


36 


12 


8 


8 



Crataegus monogyna (0.0028), and Sambucus ebulus (0.004). Zero interspecific distanc- 
es were detected between individuals belonging to Sorbus aucuparia and S. domestica, 
among the three Crataegus species, the three Fraxinus species, between Sambucus nigra 
and S. ebulus, and between the two Ilex species. Conversely, no intraspecific sequence 
variation was found at trnH-psbA in Cedrus deodara, C. libani, Sorbus torminalis, Cra- 
taegus monogyna, C. oxyacantha, Fraxinus angustifolia, Sambucus racemosa, Passiflora 
edulis, Punica granatum, Ziziphus jujuba and the two Ilex species. Interspecific genetic 
differences produced by this marker exhibited values higher than zero (0.001 8-0.0298) 
only in five species belonging to Cedrus, Aesculus and Passiflora genera, and in Fraxinus 
ornus and Sambucus racemosa. 

The values of the maximum intra- and minimum interspecific sequence divergence 
of the two combined barcoding loci are shown in Table 4 (all interspecific distances 
involve congeneric species). In agreement with data based on the single markers, non- 
overlapping intra- and interspecific distances were observed in a few species groups. As 



204 



Angeliki Laiou et al. I ZooKeys 365: 197—213 (2013) 



Table 4. Values of maximum inter- and minimum intraspecific uncorrected p-distances resulting from 
the combination of rbcL + trnH-psbA sequences, and relative barcoding gaps calculated in 24 forest tree 
taxa; n.d. = not determined; * = no sister species included in the dataset; ** = taxa with single accession. 
Shaded: species where a barcoding gap was detected. 





Samples 


Max. Intrasp. distance 


IVlin Intersp. distance 


Barcoding g«ip 


Cedvus cLtlttnticd 


2 
D 


U.UU 1 j 


A AA1 'n 
U.UU 1 J 


A 
U 


Cedrus deo duvet 


2 
D 


A 

u 


A AA1 ^ 
U.UU 1 j 


A AA1 ^ 
U.UU 1 j 


K^eCtVUS LludYll 


2 
D 


A 

u 


A AA9 2. 
U.UUZj 


A AA9 3. 
U.UUZj 


C I 

oovuus dvid 


2 
D 


A AA9Q08^^A 


A AAAO^A^Vl 
U.UUUy jUj / 1 


A AA1 O 
- U.UU IV 


Sovbus ducupdvict 


"l 
L 


A AAC Q 
U.UU JO 


A 

U 


A AA^G 
- U.UUjO 


Sorbus dojnesticd 


2 
D 


A AAAO 

U.UUUy 


A 

U 


A AAAO 

- u.uuuv 


Sovbus tovmindlis 


2 
D 


A 

u 


A AAAO 
U.UUUV 


A AAAO 
U.UUUV 


Cvdtdegus dzdvolus 


2 

D 


A AAAO 
U.UUUy 


U 


A AAAO 

- u.uuuv 


Crdtdegus monogytid 


L 


A AA1 0 
U.UUly 


A 

u 


A AA 1 O 
- U.UU 1 J 


Cvdtdegus oxydednthd 


A 
*± 


A 

U 


A 

u 


A 

u 


Aesculus hippocdstdfius 


2 
D 


A 

U 


A AA/^A 
U.UU04 


A AAf^A 
U.UUO^ 


Aesculus indica 


2 
D 


A 

U 


U.UU04 


U.UU04 


Fvdxitius ovfius 


c; 

J 


U.UUUOo 


U.UUZo^ 


A AA98 
- U.UUZo 


Frdxinus dngustijblid 


2 

D 


A AA3.< 
U.UUjo 


A 

U 


A AA3./^ 
- U.UU50 


FYdxinus excelsiov 


z 


n 
u 


A 
U 


A 


Sdmbucus nigrd 


5 


0.0017 


0 


-0.0017 


Sdmbucus ebulus 


2 


0.0101 


0 


-0.0101 


Sdmbucus racemosa** 


1 


n.d. 


0.0142 


n.d. 


Passiflora incarndtd 


2 


0.02397 


0.01588 


-0.0081 


Pdssiflom edulis** 


1 


n.d. 


0.0158 


n.d. 


Punicd grdndtum * 


4 


0 


n.d. 


n.d. 


Ziziphus jujubd* 


3 


0 


n.d. 


n.d. 


Ilex aquifolium 


4 


0 


0 


0 


Ilex Idtifolid** 


1 


n.d. 


0 


n.d. 



such, barcoding gaps were observed in Cedrus deodara and C. libani, Sorbus torminalis, 
and the two Aesculus species. All remaining taxa displayed equal (e.g. in Cedrus atlan- 
ticd) or higher values of intra- than interspecific divergence (e.g. in Passiflora incarnata, 
Fraxinus ornus, Sorbus arid). Several species showed sequences involving zero interspe- 
cific divergence (e.g. Sorbus domestica, S. aucuparia, Fraxinus excelsior, F. angustifolia, 
Sambucus nigra, S. ebulus, Crataegus spp.). The lack of additional conspecific samples 
did not allow a comparison with the high levels of interspecific divergences shown by 
two species (Passiflora edulis and Sambucus racemosd). These results suggest that there is 
a barcoding gap in only five out of 19 analyzed species, corresponding to 26.3% of our 
dataset (taxa with only one individual/species or one species/genus excluded). 

The NCBI Taxonomy database screening revealed that all the species in our data- 
set were represented by rbcL and trnH-psbA marker sequences in the database, except 
for Aesculus indica, Cedrus libani (neither marker), Crataegus azarolus and Sorbus do- 
mestica (only rbcL present). 
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When BLASTed to GenBank, all our rbcL sequences were identified by the refer- 
ence sequences at the genus level (87.5% of total taxa), or even at the species level 
(41.6%). Genus misidentification occurred in the three Crataegus species, for which 
genera Cotoneaster, Pyrus, Piracantha, Amelanchier, Chaenomeles (all belonging to the 
Rosaceae family) and Crataegus were also the best match. In contrast, correct genus 
and species identifications were obtained for Ilex aquifolium, Passiflora incarnata and 
P. edulis, Punica granatum, Ziziphus jujuba, Sambucus nigra, Sorbus torminalis, Cedrus 
atlantica and C. deodara. 

TrnH-psbA was outperformed by rbcL, since none of the Sorbus sequences (four 
species) matched the right genus, and only eight species (33.3%) were correctly identi- 
fied {Fraxinus ornus, Passiflora incarnata, Punica granatum, Ziziphus jujuba, Sambucus 
racemosa, Cedrus atlantica and C. deodara). All other samples shared the highest score 
with other species (e.g. Aesculus hippocastanum with A turbinata, Fraxinus excelsior with 
F. angusitfolia, Sambucus nigra with S. racemosa, Crataegus monogyna with several other 
species), or even hit the wrong species (e.g. Ilex aquifolium, Sambucus ebulus, Crataegus 
oxyacanthd). The four taxa not represented in GenBank {Cedrus libani, Aesculus indica, 
Creataegus azarolus and Sorbus domesticd) were assigned to the correct genus. As a final 
result, only 1 1 species were correctly identified by the two locus-combination corre- 
sponding to 55% of the investigated species having a reference in GenBank (45.8% 
of the total species set). A summary of the correct species identifications achieved with 
the three discrimination methods used in the present study is shown in Table 5. Thir- 
teen species (54.2% of our dataset) were identified by at least two methods. Only two 
species {Cedrus deodara and Sorbus torminalis) were identified with the three methods, 
whereas the absence of conspecific GenBank references prevented the same full identi- 
fication for Cedrus libani and Aesculus indica. In contrast, six species (corresponding to 
three species pairs and totalling 25% of our dataset) appeared unidentifiable with any 
method: Crataegus monogyna, C. oxyacantha, Sorbus aucuparia, S. domestica, Fraxinus 
angustifolia, F. excelsior. Two species {Crataegus azarolus and Sorbus arid) were dis- 
criminated only by means of sequence specificity but received no confidence by any of 
the other two approaches (the former was absent in GenBank). 



Discussion 

Marker applicability 

In our dataset, the rbcL + trnH-psbA combination showed the highest amplification 
and sequencing success (100%), whereas matK showed a much lower success (55.9%). 
Specifically, the currently most adopted primers set for Angiosperms {matK_KlM) 
failed in the amplification of the Rosaceae, and matKlV/lR primers, suggested for the 
Pinaceae, failed to amplify Cedrus sp. In addition, matK also revealed severe difficulties 
in the amplification and/or sequencing steps in the genera Berberis (Berberidaceae), 
Vitex (Rhamnaceae), Cercis (Leguminosae) and Ginkgo (Ginkgoaceae), in the ongoing 
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Table 5. Summary of the species identification success achieved with rbch + trnH-psbA and the three 
discrimination methods in the present study: occurrence of unique haplotypes in the total species set, 
genetic distances among and within congeneric species, correct species match in the GenBank database. 
Green: correct identification; red: non confident/wrong identification; shaded = not determined (no in- 
tra- or interspecific samples investigated); a = species absent in GenBank with either one or both markers. 
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prosecution of this work. The lack of universality of matK was already reported by e.g. 
Kress and Erickson (2007), Fazekas et al. (2008), Ford et al. (2009), De Mattia et al. 
(2012). MatK_KIM, (Kim, unpublished) is still considered the primer set with the 
highest match for eudicots, while matKl¥/2K was efficiently used in a comprehen- 
sive study across Pinaceae (Wang et al. 1999). Dunning and Savolainen (2010) also 
noted that matK_KlM is not the best choice for Rosaceae and rather suggested the use 
of specific primer sets. The difficulty of defining the best primer choice for matK in 
Conifers was already faced by e.g. Li et al. (201 1) and Armenise et al. (2012). When 
applied to international trade and safe use of medicinal plants, matK yielded 54.0% 
of amplification efficiency in Chen et al. (2010), whereas Kool et al. (2012) produced 
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PCR products for less than 30% of the specimens, and sequencing success was only 
10% in Wallace et al. (2012). 

In contrast, trnH-psbA provided better discrimination than matK in many diverse 
tree genera such zsAlnus (Roy et al. 2010), Ficus (Ren et al. 2010), Quercus (Simeone 
et al. 2013), and more generally in Angiosperms (Pang et al. 2012). Nevertheless, 
matK. is still recommended by the CBOL Plant working Group (2009) as the first op- 
tion to rely on in terms of sequence variability. We therefore suggest that an efficient 
barcoding workflow should include a first preliminary screening with matK universal 
primer set(s) and then, depending to the amplification results, to select trnH-psbA 
as an additional marker to rbcL. Alternatively, a simple and clear morphological trait 
may be included in the analysis or address the search for the most appropriate matK 
primer set based on the biological group under study (Bruni et al. 2012, Dunning and 
Savolainen 2010). 



Species identification and discrimination 

The BLUSTClust analysis yielded a 66.7% species discrimination, which is a bit lower 
but still in line with the general limit acknowledged for land plants when markers from 
a single genetic linkage group are used (ca. 70%; CBOL Plant Working Group 2009). 
In agreement, similar percentages (68-71%) were obtained in broader taxonomic in- 
vestigations in forests of North and meso-America (Fazekas et al. 2008, Gonzalez et al. 
2009), although by use of a different way to assess species identification success (i.e. 
support for species monophyly through barcodes). Our barcoding data, dedicated to 
woody plants sampled in a different ecological zone, approach Piredda et al. (2011), 
who reported 73% efficiency in a floristic investigation of the Italian tree flora by 
means of sequence specificity; nevertheless, more intraspecific diversity and more spe- 
cies pairs were surveyed in the present work. 

The highest identification success was achieved with the analysis based on the 
uniqueness of sequence character states, where some parts in the haplotypes (espe- 
cially some trnH-psbA indels) appeared diagnostics for certain species. However, 
more data are required to confirm these diagnostic sequence features. Yet, if con- 
firmed, these features may be important in view of the generally low interspecific 
divergences we observed. Conversely, the analysis with the barcoding gaps suggests 
that such a discrimination approach may yield a lower efficiency, at least with trnH- 
psbA, since the uncorrected p-distance analysis removed all indels. A further com- 
plication we encountered was constituted by the high intraspecific divergences (e.g. 
in C. atlanticd) and the sharing of haplotypes among congeneric species (e.g. in 
Sorbus, Crataegus, Fraxinus, Sambucus). All these results challenge the application 
of DNA barcoding with rbcL + trnH-psbA in the taxa investigated here. This is the 
more so as GenBank also showed a low identification efficiency and sometimes lead 
to erroneous identifications, most often due to the limited number of available refer- 
ence sequences and their sometimes very high intraspecific divergences. Little and 
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Stevenson (2007) and Ross et al. (2008) found that BLAST (and other similarity 
methods) can give accurate identifications on GenBank (see also de Vere et al. 2012 
and Pang et al. 2012), although some distorted results, in inverse proportion to the 
number of reference sequences per species in the databases, may render these ap- 
proaches inappropriate. Ideally, a reference library should provide multiple samples 
from unambiguously identified species or taxa, and cover intraspecific variability and 
closely related species to evaluate the degree of divergences among barcodes. Unfor- 
tunately, the reference list in the GenBank database is still far from complete. The 
small numbers of available sequences per species and for either marker prevented us 
from confidently retrieving correct species names in Aesculus hippocastanum, Fraxi- 
nus excelsior, Ilex latifolium, Crataegus monogyna (highest scores shared with other 
congenerics). Moreover, it induced us to assign a query to the wrong species, as in 
the cases of Aesculus indica {A. pavia), Fraxinus angustifolia {F. excelsior), Passiflora 
edulis {P. incarnatd), Sambucus ebulus {S. adnata), Crataegus azarolus and C oxyacan- 
tha (C. monogyna), Cedrus libani (C. deodard), and the four Sorbus species. Clearly, a 
consistent enrichment of the reference databases is a priority for future applications 
of DNA barcoding. 



DNA barcoding of medicinal and aromatic plants 

DNA barcoding is a substantial improvement of our capacity to document the existing 
biodiversity. It is also a powerful research complement for human socio-economics, 
safety, trade control, frauds discovery and detection of forgeries in plant commercial 
products (Newmaster and Ragupathy 2010). Kool et al. (2012), for example, were able 
to document 1 8 misidentifications and eight forgeries among 111 samples of medici- 
nal plants in a local market in Marrakech (Morocco). 

The Mediterranean woody flora comprises numerous valuable species used as or- 
namentals or for secondary products processing and marketing (edibles, essential oils, 
medicinal compounds). Field identification, authentication and certification of germ- 
plasm and raw materials are a major concern. As such, our results on Cedrus support 
previous findings that members of Pinaceae can be efficiently barcoded with rbcL + 
trnH-psbA (at least at a regional scale; Armenise et al. 2012). Cedars involve four 
different extant species: the three more highly diffused and with great ornamental, 
ecological and cultural relevance were here discriminated, while Cedrus brevifolia, a 
highly protected, rare endemic surviving in only one population on Troodos Moun- 
tains (Cyprus), still awaits further investigations. We also found specific haplotypes 
for the highly important and largely cultivated Punica granatum. In this case as well, 
further investigations involving the only other species of genus Punica (Punica pro- 
topunica, a rare endemic of the Socotra Island, Yemen, very similar in morphology, 
production of fruits and secondary metabolites) would eventually provide new tools 
for its conservation and management. 
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On the other hand, we confirm the difficulties previously encountered in barcod- 
ing Fraxinus (Area et al. 2012) and the extensive interspecific haplotype sharing in 
Crataegus (Fineschi et al. 2005) and Sorbus (Robertson et al. 2010). For instance, Bur- 
gess et al. (201 1) were able to discriminate only one out of four Crataegus species with 
five barcoding markers. Indeed, these genera are likely to be as refractory to barcoding 
as other woody groups including oaks (Piredda et al. 20 1 1 ) and willows (von Crautlein 
et al. 2011). Low mutation rates, incomplete lineage sorting and hybridization are the 
most reported causes (Hollingsworth et al. 201 1). However, we were able to discrimi- 
nate Fraxinus ornus, a very important medicinal and industrial plant, and Crataegus 
azarolus, a protected fruit tree, historically used for a number of medicinal purposes. 
Conversely, we were unable to discriminate the Crataegus monogyna - C. oxyacantha 
species pair (see also Bruni et al. 2012), but this has little practical importance since 
both hawthorns are equally used for the same medicinal purposes. Very promising data 
were collected on Sorbus aria and S. torminalis, Ilex aquifolium, Aesculus Hippocasta- 
num, Passiflora and Ziziphus jujuba, suggesting that an efficient barcoding could be 
achieved on these species, at least at regional scales. In contrast, Sambucus sp. showed 
a large intraspecific divergence and require further investigations on larger datasets. 
More recently, the nuclear ribosomal ITS (especially the ITS2 portion) has been sug- 
gested as an efficient barcoding locus for complex plant groups (Chen et al. 2010). 
However, Kool et al. (2012) could not use this marker in 45% of their dataset because 
of the low amplification and sequencing efficacy detected and fungal contamination, 
particularly in the root material. Therefore, this marker still appears not completely 
devoid of some pitfalls and certainly will require an improvement of current protocols. 



Conclusion 

Recently, an outstanding research interest towards DNA barcoding of regional floras 
with biological and/or economical relevance has spread. In the present work, we lay the 
foundations towards DNA barcoding applications of important woody plant genera in 
the Mediterranean basin, such as Cedrus, Aesculus, Ilex, Passifllora, Punica, Sambucus, 
Sorbus, Ziziphus. All these genera include valuable taxa for multiple natural and eco- 
nomic purposes, and combine with similar DNA barcoding investigations performed 
on Euro-Mediterranean forested land in recent years (Piredda et al. 2011, von Crau- 
tlein et al. 2011, Armenise et al. 2012, Simeone et al. 2013). Gathered results expose 
limitations of DNA barcoding, most of which are due to (1) the imperfect discrimina- 
tion ability of the markers and methods currently in use, (2) the biological peculiari- 
ties of some genera, and (3) the low taxonomic coverage of the reference databases. 
Future technological advances, additional markers and larger sample sets at different 
geographical scales (from continental to local) are therefore auspicated to improve cur- 
rent protocols and identification success for the practical conservation and valorisation 
of forest natural resources. 
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